TL;DR: Mining Reddit to Learn Automatic Summarization
نویسندگان
چکیده
Recent advances in automatic text summarization have used deep neural networks to generate high-quality abstractive summaries, but the performance of these models strongly depends on large amounts of suitable training data. We propose a new method for mining social media for author-provided summaries, taking advantage of the common practice of appending a “TL;DR” to long posts. A case study using a large Reddit crawl yields the WebisTLDR-17 corpus, complementing existing corpora primarily from the news genre. Our technique is likely applicable to other social media sites and general web crawls.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملTL;DR: Learning Long Term Dependencies with Deep Logarithmic Residual LSTMs
We present logarithmically spaced residual connections, a modification to the existing encode-decode LSTM architecture for sequence to sequence learning problems that combats vanishing gradients and facilitates learning in otherwise unsolved long term seq-to-seq problems. Our scheme effectively allows backpropagation to train on length N sequences as if they were length logN ; our simplest sche...
متن کاملTL;DR: Improving Abstractive Summarization Using LSTMs
Traditionally, summarization has been approached through extractive methods. However, they have produced limited results. More recently, neural sequence-tosequence models for abstractive text summarization have shown more promise, although the task still proves to be challenging. In this paper, we explore current state-of-the-art architectures and reimplement them from scratch. We begin with a ...
متن کاملAn Approach for Concept-based Automatic Multi- Document Summarization using Machine Learning
Text Summarization is compressing the source text into a shorter version preserving its information content and overall meaning. It is very complicated for human beings to manually summarize large documents of text. Text summarization plays an important role in the area of natural language processing and text mining. Many approaches use statistics and machine learning techniques to extract sent...
متن کاملSystematic literature review of fuzzy logic based text summarization
Information Overloadrq is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017